Judging Grammaticality with Tree Substitution Grammar Derivations
نویسنده
چکیده
In this paper, we show that local features computed from the derivations of tree substitution grammars — such as the identify of particular fragments, and a count of large and small fragments — are useful in binary grammatical classification tasks. Such features outperform n-gram features and various model scores by a wide margin. Although they fall short of the performance of the hand-crafted feature set of Charniak and Johnson (2005) developed for parse tree reranking, they do so with an order of magnitude fewer features. Furthermore, since the TSGs employed are learned in a Bayesian setting, the use of their derivations can be viewed as the automatic discovery of tree patterns useful for classification. On the BLLIP dataset, we achieve an accuracy of 89.9% in discriminating between grammatical text and samples from an n-gram language model.
منابع مشابه
Judging Grammaticality with Count-Induced Tree Substitution Grammars
Prior work has shown the utility of syntactic tree fragments as features in judging the grammaticality of text. To date such fragments have been extracted from derivations of Bayesianinduced Tree Substitution Grammars (TSGs). Evaluating on discriminative coarse and fine grammaticality classification tasks, we show that a simple, deterministic, count-based approach to fragment identification per...
متن کاملD-Tree Substitution Grammars
There is considerable interest among computational linguists in lexicalized grammatical frameworks; lexicalized tree adjoining grammar (LTAG) is one widely studied example. In this paper, we investigate how derivations in LTAG can be viewed not as manipulations of trees but as manipulations of tree descriptions. Changing the way the lexicalized formalism is viewed raises questions as to the des...
متن کاملLanguage Modeling with Tree Substitution Grammars
We show that a tree substitution grammar (TSG) induced with a collapsed Gibbs sampler results in lower perplexity on test data than both a standard context-free grammar and other heuristically trained TSGs, suggesting that it is better suited to language modeling. Training a more complicated bilexical parsing model across TSG derivations shows further (though nuanced) improvement. We conduct an...
متن کاملEecient Disambiguation by Means of Stochastic Tree Substitution Grammars
In Stochastic Tree Substitution Grammars (STSGs), one parse(tree) of an input sentence can be generated by exponentially many derivations ; the probability of a parse is deened as the sum of the probabilities of its derivations. As a result, some methods of Stochastic Context-Free Grammars (SCFGs), e.g. the Viterbi algorithm for nding the most probable parse (MPP) of an input sentence, are not ...
متن کاملA General, Sound and Efficient Natural Language Parsing Algorithm based on Syntactic Constraints Propagation
This paper presents a new context-free parsing algorithm based on a bidirectional strictly horizontal strategy which incorporates strong top–down predictions (derivations and adjacencies). From a functional point of view, the parser is able to propagate syntactic constraints reducing parsing ambiguity. From a computational perspective, the algorithm includes different techniques aimed at the im...
متن کامل